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Abstract. The running maximum-minimum (max-mtn) filter computes the maxima and 
minima over running windows of size w. This filter has numerous applications in signal 
processing and time series analysis. We present an easy-to-implement online algorithm 
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■ 1. Introduction 

O 

The maximum and the minimum are the simplest form of order statistics. Com- 
puting either the global maximum or the global minimum of an array of n ele- 
ments requires n - 1 comparisons, or slightly less than one comparison per ele- 
^ ■ ment. However, to compute simultaneously the maximum and the minimum, only 

3 r «/2] - 2 comparisons are required in the worst case [Cormen et al. 2001], or 
slightly less than 1.5 comparisons per element. 

A related problem is the computation of the running maximum-minimum (max- 
min) filter: given an array a\, . . . ,a n , find the maximum and the minimum over all 
windows of size w, that is max / min !e [ 7J+vy ) a,- for all j (see Fig. 1.1). The running 
maximum (max) and minimum (min) filters are defined similarly. The max-min 
filter problem is harder than the global max-min problem, but a tight bound on the 
number of comparisons required in the worst case remains an open problem. 

Running maximum-minimum (max-min) niters are used in signal processing and 
pattern recognition. As an example, Keogh and Ratanamahatana [2005] use a pre- 
computed max-min filter to approximate the time waiping distance between two 
time series. Time series applications range from music retrieval [Zhu and Shasha 
2003] to network security [Sun et al. 2004]. The unidimensional max-min filter 
can be applied to images and other bidimensional data by first applying the uni- 
dimensional on rows and then on columns. Image processing applications include 
cancer diagnosis [He et al. 2005], character [Ye et al. 2001] and handwriting [Ye 
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Fig. 1.1: Example of a running max-min filter. 



et al. 2001] recognition, and boundary feature comparison [Taycher and Garakani 
2004]. 

We define the stream latency of a filter as the maximum number of data points 
required after the window has passed. For example, an algorithm requiring that 
the whole data set be available before the running filter can be computed has a 
high stream latency. In effect, the stream latency is a measure of an algorithm on 
the batch/online scale. We quantify the speed of an algorithm by the number of 
comparisons between values, either a < b or b < a, where values are typically 
floating-point numbers. 

We present the first algorithm to compute the combined max-min filter in no more 
than 3 comparisons per element, in the worst case. Indeed, we are able to save 
some comparisons by not treating the max-min filter as the aggregate of the max 
and min filters: if x is strictly larger than k other numbers, then there is no need to 
check whether x is smaller than any of these numbers. Additionally, it is the first 
algorithm to require a constant number of comparisons per element without any 
stream latency and it uses less memory than competitive alternatives. Further, our 
algorithm requires no more than 2 comparisons per element when the input data 
is monotonic (either non-increasing or non-decreasing). We provide experimental 
evidence that our algorithm is competitive and can be substantially faster (by a 
factor of 2) when the input data is piecewise monotonic. A maybe surprising result 
is that our algorithm is arguably simpler to implement than the recently proposed 
algorithms such as Gil and Kimmel [2002] or Droogenbroeck and Buckley [2005]. 
Finally, we prove that at least 2 comparisons per element are required to compute 
the max-min filter when no stream latency is allowed. 
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Table I: Worst-case number of comparisons and stream latency for competitive max-min filter algo- 
rithms. Stream latency and memory usage (buffer) are given in number of elements. 



algorithm 


comparisons per ele- 
ment (worst case) 


stream latency 


buffer 


naive 


2w-2 





0(1) 


van Herk [1992], Gil 


6-8/w 


w 


Aw + 0(1) 


andWerman [1993] 








Gil and Kimmel 


3+2 log w/w 


w 


6w + 0(l) 


[2002] 


+0(1 /w) 






New algorithm 
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2w + 0(\) 



2. Related Work 

Pitas [1989] presented the max filter algorithm maxline requiring O(logw) com- 
parisons per element in the worst case and an average-case performance over inde- 
pendent and identically distributed (i.i.d.) noise data of slightly more than 3 com- 
parisons per element. Douglas [1996] presented a better alternative: the max filter 
algorithm maxlist was shown to average 3 comparisons per element for i.i.d. input 
signals and Myers and Zheng [1997] presented an asynchronous implementation. 

More recently, van Herk [1992] and Gil and Werman [1993] presented an algo- 
rithm requiring 6-8/w comparisons per element, in the worst case. The algorithm 
is based on the batch computation of cumulative maxima and minima over over- 
lapping blocks of 2w elements. For each filter (max and min), it uses a memory 
buffer of 2w + 0(1) elements. We will refer to this algorithm as the van Herk- 
Gil- Werman algorithm. Gil and Kimmel [2002] proposed an improved version 
(Gil-Kimmel) which lowered the number of comparisons per element to slightly 
more than 3 comparisons per element, but at the cost of some added memory us- 
age and implementation complexity (see Table I and Fig. 2.2 for summary). For 
i.i.d. noise data, Gil and Kimmel presented a variant of the algorithm requiring 
« 2 + (2 + In 2/2) log w/w comparisons per element (amortized), but with the same 
worst case complexity. Monotonic data is a worst case input for the Gil-Kimmel 
variant. 

Droogenbroeck and Buckley [2005] proposed a fast algorithm based on anchors. 
They do not improve on the number of comparisons per element. For window sizes 
ranging from 10 to 30 and data values ranging from to 255, their implementation 
has a running time lower than their van Herk-Gil- Werman implementation by as 
much as 30%. Their Gil-Kimmel implementation outperforms their van Herk-Gil- 
Werman implementation by as much as 15% for window sizes larger than 15, but 
is outperformed similarly for smaller window sizes, and both are comparable for 
a window size equals to 15. The Droogenbroeck-Buckley min filter pseudocode 
alone requires a full page compared to a few lines for van Herk-Gil- Werman al- 
gorithm. Their experiments did not consider window sizes beyond w = 30 nor 
arbitrary floating point data values. 
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Fig. 2.2: Worst-case number of comparisons per element with the van Herk-Gil-Werman (van Herk) 
algorithm, the Gil-Kimmel algorithm, and our new streaming algorithm (less is better). 



3. Lower Bounds on the Number of Comparisons 

Gil and Kimmel [2002] showed that the prefix max-min (max / min,< 7 - a; for all j) 
requires at least log 3 « 1.58 comparisons per element, while they conjectured that 
at least 2 comparisons are required. We prove that their result applies directly to 
the min-max filter problem and show that 2 comparisons per element are required 
when no latency is allowed. 

Theorem 1. In the limit where the size of the array becomes infinite, the min-max 
filter problem requires at least 2 comparisons per element when no stream latency 
is allowed, and log 3 comparisons per element otherwise. 

Proof. Let array values be distinct real numbers. When no stream latency is 
allowed, we must return the maximum and minimum of window (i — w, i] using 
only the data values and comparisons in [1, i]. An adversary can choose the array 
value a\ so that a\ must be compared at least twice with preceding values: it takes 
two comparisons with a\ to determine that it is neither a maximum nor a minimum 
(ax e (min ;e ( ; _ W ! ] aj, maXy e ( ! _ Wj j] aj)). Hence, at least 2(n - w) comparisons are 
required, but because 2(n -w)/n — > 2 as « — > oo, two comparisons per element are 
required in the worst case. 

Next we assume stream latency is allowed. Browsing the array from left to 
right, each new data point a,- for i e [w,n] can be either a new maximum 
(a; = max 7 g(,_ HV ] aj), a new minimum (a,- - min / - € (,-_ w> j] aj), or neither a new max- 
imum or a new minimum (a; e (miny 6 (,-_ Wj /] aj, maxy e (,-_ Wi /] aj)). For any ternary 
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sequence such as MAX-MAX-MIN-NOMAXMIN-MIN-MAX-. . . , we can gener- 
ate a corresponding array. This means that a min-max filter needs to distinguish 
between more than 3 n ~ w different partial orders over the values in the array a. In 
other words, the binary decision tree must have more than 3"~ w leaves. Any binary 
tree having / leaves has height at least [log f|. Hence, our binary tree must have 
height at least flog 3"~ vv ] > (n — w) log 3, proving that (1 - w/n) log 3 — > log 3 com- 
parisons per element are required when n is large. □ 

By the next proposition, we show that the general lower bound of 2 comparisons 
per element is tight. 

Proposition 1. There exists an algorithm to compute the min-max filter in no more 
than 2 comparisons per element when the window size is 3 (w - 3), with no stream 
latency. 

Proof. Suppose we know the location of the maximum and minimum of the 
window [i - 3, i - 1]. Then we know the maximum and minimum of {a ! _2,a ! -i}. 
Hence, to compute the maximum and minimum of {a,-_2,«/-i,«i}, it suffices to 
determine whether a,_i > a,- and whether <3,_2 > cii- □ 

4. The Novel Streaming Algorithm 

To compute a running max-min filter, it is sufficient to maintain a monotonic wedge 
(see Fig. 4.3). Given an array a - a\,..., a n , a monotonic wedge is made of two 
lists U, L where U\ and L\ are the locations of global maximum and minimum, 
U2 and Li are the locations of the global maximum and minimum in (U\, 00) and 
(Li, 00), and so on. Formally, U and L satisfy max^c/.,, a,- = a\j j and min,->£._, a,- = 
a Lj for j = 1,2, . .. where, by convention, Uq = L = -00. If all values of a are 
distinct, then the monotonic wedge U, L is unique. The location of the last data 
point n in a, is the last value stored in both U and L (see U5 and L4 in Fig. 4.3). A 
monotonic wedge has the property that it keeps the location of the current (global) 
maximum (U\) and minimum (L\) while it can be easily updated as we remove 
data points from the left or append them from the right: 

o to compute a monotonic wedge of 02,03, ...,a„ given a monotonic wedge 
U, L for a\,02, ■ . ■ , a n , it suffices to remove (pop) U\ from U if U\ = 1 or L\ 
from L if L\ = 1; 

o similarly, to compute the monotonic wedge of 01,02, . . . ,a n , a n+ i, if a, 1+ i > 
a n , it suffices to remove the last locations stored in U until «i ast([/ ) > a n+ i or 
else, to remove the last locations stored in L until £q ast ( L ) ^ ®n+u and then to 
append the location n + 1 to both U and L. 
Fig. 4.4 provides an example of how the monotonic wedge for window [i—w, i-l] 
is updated into a wedge for [i — w + 1, i]. In Step A, we begin with a monotonic 
wedge for [i -w,i- 1]. In Step B, we add value a; to the interval. This new value 
is compared against the last value a,-_i and since a, > fl[/ 5 , we remove the index U5 
from U. Similarly, because a\ > au 4 , we also remove U4. In Step C, the index i is 
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Fig. 4.3: Example of a monotonic wedge: data points run from left to right. 



appended to both U and L and we have a new (extended) monotonic wedge. Then, 
we would further remove Li, consider the next value forward, and so on. 

Algorithm 1 and Proposition 2 show that a monotonic wedge can be used to 
compute the max-min filter efficiently and with few lines of code. 



Algorithm 1 Streaming algorithm to compute the max-min filter using no more 
than 3 comparisons per element. 

1 : INPUT: an array a indexed from 1 to n 

2: INPUT: window width w > 2 

3: U, L <— empty double-ended queues, we append to "back" 
4: append 1 to U and L 
5: for i in {2, ...,«} do 
6: if i > w + 1 then 

7: OUTPUT: flf ron t ((/) as maximum of range [i - w, i) 

8: OUTPUT: flf ron t (L) as minimum of range [i - w, i) 

9: if a, > a, i then 
10: pop U from back 
11: while a,- > flb ac k (f/) do 
12: pop U from back 

13: else 

14: pop L from back 

15: while a,- < flbackiL) ^° 

16: pop L from back 

17: append i to U and L 

18: if i = w + front(t/) then 

19: pop U from front 

20: else if i = w + front(L) then 

2 1 : pop L from front 



Proposition 2. Algorithm 1 computes the max-min filter over n values using no 
more than 3n comparisons, or 3 comparisons per element. 

Proof. We prove by induction that in Algorithm 1 , U and L form a monotonic 
wedge of a over the interval [max{/ - w, 1(, i) at the beginning of the main loop 
(line 5). Initially, when i — 2, U,L — {!}, U,L is trivially a monotonic wedge. We 
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Fig. 4.4: Algorithm 1 from line 5 to line 17: updating the monotonic wedge is done by either 
removing the last elements of U or the last elements of L until U, L form a monotonic wedge for 
[max{/ - w, 1}, (']. 



have that the last component of both U and L is i - 1. If a,- > a,_i (line 1 1), then 
we remove the last elements of U until a\ as i (U ^ > a n +\ (line 1 1) or if a,- < Of_i, we 
remove the last elements of L until aj ast(L , < a, J+ i (line 15). Then we append / to 
both U and L (line 17). The lists U, L form a monotonic wedge of [max{/ - w, 1}, /] 
at this point (see Fig. 4.4). After appending the latest location i (line 17), any 
location j < i will appear in either U or L, but not in both. Indeed, i — 1 is 
necessarily removed from either U or L. To compute the monotonic wedge over 
[max{/ - w + 1, 1 }, i + 1) from the monotonic wedge over [max{/ - w, 1 }, i] , we check 
whether the location i — w is in U or L at line 18 and if so, we remove it. Hence, 
the algorithm produces the correct result. 

We still have to prove that the algorithm will not use more than 3n comparisons, 
no matter what the input data is. Firstly, the total number of elements that Algo- 
rithm 1 appends to queues U and L is 2n, as each i is appended both to U and L 
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(line 17). The comparison on line 9 is executed n - 1 time and each execution re- 
moves an element from either U or L (lines 10 and 14), leaving 2n-(n-\) = n+l el- 
ements to be removed elsewhere. Because each time the comparisons on lines 1 1 
and 15 gives true, an element is removed from U or L, there can only be n + 1 
true comparisons. Morever, the comparisons on lines 11 and 15 can only be, false 
once for a fixed a, since it is the exit condition of the loop. The number of false 
comparisons is therefore n. Hence, the total number of comparisons is at most 
(n - 1) + (n + 1) + n = 3n, as we claimed. □ 

While some signals such as electroencephalograms (EEG) resemble i.i.d noise, 
many more real- world signals are piecewise quasi-monotonic [Lemire etai. 2005]. 
While one Gil-Kimmel valiant [Gil and Kimmel 2002] has a comparison complex- 
ity of nearly 2 comparisons per element over i.i.d noise, but a worst case complexity 
of slightly more than 3 comparisons for monotonic data, the opposite is true of our 
algorithm as demonstrated by the following proposition. 

Proposition 3. When the data is monotonic, Algorithm 1 computes the max-min 
filter using no more than 2 comparisons per element. 

Proof. If the input data is non-decreasing or non-increasing, then the conditions 
at line 11 and line 15 will never be true. Thus, in the worse case, for each new 
element, there is one comparison at line 9 and one at either line 11 or line 15. □ 

The next proposition shows that the memory usage of the monotonic wedge is at 
most w + 1 elements. Because U and L only store the indexes, we say that the total 
memory buffer size of the algorithm is 2w + 0(1) elements (see Table I). 

Proposition 4. In Algorithm 1, the number of elements in the monotonic wedge 
(siz,e(U) + size(L)) is no more than w + 1. 

Proof. Each new element is added to both U and L at line 17, but in the next 
iteration of the main loop, this new element is removed from either U or L (line 10 
or 14). Hence, after line 14 no element in the w possible elements can appear both 
in U and L. Therefore size(L0 + size(L) < w + 1. □ 

5. Implementation and Experimental Results 

While interesting theoretically, the number of comparison per element is not neces- 
sarily a good indication of real-world performance. We implemented our algorithm 
in C++ using the STL deque template. A more efficient data structure might be 
possible since the size of our double-ended queues are bounded by w. We used 
64 bits floating point numbers ("double" type). In the pseudocode of Algorithm 1, 
we append i to the two double-ended queues, and then we systematically pop one 
of them (see proof of proposition 2). We found it slightly faster to rewrite the code 
to avoid one pop and one append (see appendix). The implementation of our algo- 
rithm stores only the location of the extrema whereas our implementation of the van 
Herk-Gil-Werman algorithm stores values. Storing locations means that we can 



STREAMING MAXIMUM-MINIMUM FILTER 



9 



compute the arg max / min filter with no overhead, but each comparison is slightly 
more expensive. While our implementation uses 32 bits integers to store locations, 
64 bits integers should be used when processing streams. For small window sizes, 
Gil and Kimmel [2002] suggests unrolling the loops, essentially compiling w in 
the code: in this manner we could probably do away with a dynamic data structure 
and the corresponding overhead. 

We ran our tests on an AMD Athlon 64 3200+ using a 64 bit Linux platform 
with 1 Gigabyte of RAM (no thrashing observed). The source code was compiled 
using the GNU GCC 3.4 compiler with the optimizer option "-02". 

We process synthetic data sets made of 1 million data points and report wall 
clock timings versus the window width (see Fig. 5.5). The linear time complexity 
of the naive algorithm is quite apparent for w > 10, but for small window sizes 
(w < 10), it remains a viable alternative. Over i.i.d. noise generated with the Unix 
rand function, the van Herk-Gil-Werman and our algorithm are comparable (see 
Fig. 5(b)): both can process 1 million data points in about 0.15 s irrespective of the 
window width. For piecewise monotonic data such as a sine wave (see Fig. 5(a)) 
our algorithm is roughly twice as fast and can process 1 million data points in about 
0.075 s. Our C++ implementation of the Gil-Kimmel algorithm [Gil and Kimmel 
2002] performed slightly worse than the van Herk-Gil-Werman algorithm. To 
insure reproducibility, the source code is available freely from the author. 

6. Conclusion and Future Work 

We presented an algorithm to compute the max-min filter using no more than 3 com- 
parisons per element in the worst case whereas the previous best result was slightly 
above 3 + 2 log w/w + 0(1 /w) comparisons per element. Our algorithm has lower 
latency, is easy to implement, and has reduced memory usage. For monotonic 
input, our algorithm incurs a cost of no more than 2 comparisons per element. Ex- 
perimentally, our algorithm is especially competitive when the input is piecewise 
monotonic: it is twice as fast on a sine wave. 

We have shown that at least 2 comparisons per element are required to solve the 
max-min filter problem when no stream latency is allowed, and we showed that this 
bound is tight when the window is small (w = 3). 
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Appendix: C++ source code for the streaming algorithm 

// input: array a, integer window width w 
II output: arrays maxval and minval 
II buffer: lists U and L 
II requires: STL for deque support 
deque<int> U, L; 

for(uint i = 1; i < a.sizeQ; ++i) ( 
if (i>=w) j 

maxval[i-w] = a[U.size()>0 ? U. front () 

minval[i-w] = a[L.size()>0 ? L. front () 
) // end if 
if(a[i] > a[i-l]) j 
L.push_back(i-1); 

if ( i == w+L. front () ) L. pop.front () ; 
while(U. size () >0) ( 
if (a[i]<=a[U.back() ]) { 
if (i == w+U. front () ) U. pop.front () ; 
break ; 
)// end if 
U. pop_back ( ) ; 
) // end while 



i -i]; 
i -U; 
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j else { 

U.push_back(i -1) ; 

if ( i == w+U. front () ) U. pop.fr ont () ; 
while (L. size () >0) j 
if (a[i]>=a[L.back() ]) j 
if(i == w+L. front ()) L.pop_front(); 
break ; 
\ II end if 
L . pop_back ( ) ; 
} // end wh He 
j // end if else 
} II end fo r 

maxval [a . size ()-w] = a[U.size()>0 ? U. front () : a.size()-l]; 
minval [a . size ()-w] = a[L.size()>0 ? L. front () : a.sizeQ-1]; 



